Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 20402133 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.8 GiB |
| Average record size in memory | 199.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 2 |
Date of Tweet has a high cardinality: 3207253 distinct values | High cardinality |
Language has a high cardinality: 66 distinct values | High cardinality |
User ID is highly correlated with Year Account Created | High correlation |
Following is highly correlated with Followers and 1 other fields | High correlation |
Followers is highly correlated with Following and 1 other fields | High correlation |
Total Tweets is highly correlated with Following and 1 other fields | High correlation |
Year Account Created is highly correlated with User ID | High correlation |
User ID is highly correlated with Year Account Created | High correlation |
Year Account Created is highly correlated with User ID | High correlation |
User ID is highly correlated with Year Account Created | High correlation |
Following is highly correlated with Followers | High correlation |
Followers is highly correlated with Following and 1 other fields | High correlation |
Total Tweets is highly correlated with Followers | High correlation |
Year Account Created is highly correlated with User ID | High correlation |
Unnamed: 0 is highly correlated with Tweet ID | High correlation |
User ID is highly correlated with Year Account Created | High correlation |
Tweet ID is highly correlated with Unnamed: 0 | High correlation |
Year Account Created is highly correlated with User ID | High correlation |
Following is highly skewed (γ1 = 43.0372489) | Skewed |
Followers is highly skewed (γ1 = 44.53262786) | Skewed |
Total Tweets is highly skewed (γ1 = 44.25165229) | Skewed |
Unnamed: 0 is uniformly distributed | Uniform |
Unnamed: 0 has unique values | Unique |
Followers has 506476 (2.5%) zeros | Zeros |
Retweet Count has 4277424 (21.0%) zeros | Zeros |
Reproduction
| Analysis started | 2022-04-16 02:27:03.907847 |
|---|---|
| Analysis finished | 2022-04-16 02:41:03.393673 |
| Duration | 13 minutes and 59.49 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 20402133 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10201066 |
| Minimum | 0 |
|---|---|
| Maximum | 20402132 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 155.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1020106.6 |
| Q1 | 5100533 |
| median | 10201066 |
| Q3 | 15301599 |
| 95-th percentile | 19382025.4 |
| Maximum | 20402132 |
| Range | 20402132 |
| Interquartile range (IQR) | 10201066 |
Descriptive statistics
| Standard deviation | 5889588.634 |
|---|---|
| Coefficient of variation (CV) | 0.5773503116 |
| Kurtosis | -1.2 |
| Mean | 10201066 |
| Median Absolute Deviation (MAD) | 5100533 |
| Skewness | -4.302153923 × 10-17 |
| Sum | 2.081235053 × 1014 |
| Variance | 3.468725428 × 1013 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 13601421 | 1 | < 0.1% |
| 13601428 | 1 | < 0.1% |
| 13601427 | 1 | < 0.1% |
| 13601426 | 1 | < 0.1% |
| 13601425 | 1 | < 0.1% |
| 13601424 | 1 | < 0.1% |
| 13601423 | 1 | < 0.1% |
| 13601422 | 1 | < 0.1% |
| 13601420 | 1 | < 0.1% |
| Other values (20402123) | 20402123 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 20402132 | 1 | |
| 20402131 | 1 | |
| 20402130 | 1 | |
| 20402129 | 1 | |
| 20402128 | 1 | |
| 20402127 | 1 | |
| 20402126 | 1 | |
| 20402125 | 1 | |
| 20402124 | 1 | |
| 20402123 | 1 |
| Distinct | 3478339 |
|---|---|
| Distinct (%) | 17.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.341856004 × 1017 |
| Minimum | 76 |
|---|---|
| Maximum | 1.51402441 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 155.7 MiB |
Quantile statistics
| Minimum | 76 |
|---|---|
| 5-th percentile | 35672554 |
| Q1 | 491304400 |
| median | 7.447845409 × 1017 |
| Q3 | 1.305252611 × 1018 |
| 95-th percentile | 1.497324963 × 1018 |
| Maximum | 1.51402441 × 1018 |
| Range | 1.51402441 × 1018 |
| Interquartile range (IQR) | 1.30525261 × 1018 |
Descriptive statistics
| Standard deviation | 6.390079389 × 1017 |
|---|---|
| Coefficient of variation (CV) | 1.007603986 |
| Kurtosis | -1.790794388 |
| Mean | 6.341856004 × 1017 |
| Median Absolute Deviation (MAD) | 7.44784538 × 1017 |
| Skewness | 0.1376486544 |
| Sum | 8.204694187 × 1018 |
| Variance | 4.08331146 × 1035 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1.499763124 × 1018 | 18859 | 0.1% |
| 2453069245 | 16702 | 0.1% |
| 1.203552378 × 1018 | 9969 | < 0.1% |
| 31077930 | 8685 | < 0.1% |
| 247619342 | 8544 | < 0.1% |
| 88196314 | 7481 | < 0.1% |
| 4230938057 | 6857 | < 0.1% |
| 1.066275283 × 1018 | 6263 | < 0.1% |
| 1.260888403 × 1018 | 5429 | < 0.1% |
| 1.216550422 × 1018 | 5375 | < 0.1% |
| Other values (3478329) | 20307969 |
| Value | Count | Frequency (%) |
| 76 | 1 | < 0.1% |
| 221 | 3 | |
| 224 | 1 | < 0.1% |
| 324 | 1 | < 0.1% |
| 418 | 2 | < 0.1% |
| 422 | 2 | < 0.1% |
| 509 | 6 | |
| 521 | 3 | |
| 556 | 2 | < 0.1% |
| 614 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1.51402441 × 1018 | 4 | |
| 1.514011053 × 1018 | 1 | < 0.1% |
| 1.514010489 × 1018 | 1 | < 0.1% |
| 1.514007089 × 1018 | 3 | |
| 1.514005262 × 1018 | 1 | < 0.1% |
| 1.514005057 × 1018 | 1 | < 0.1% |
| 1.514001594 × 1018 | 1 | < 0.1% |
| 1.514001536 × 1018 | 1 | < 0.1% |
| 1.514001 × 1018 | 1 | < 0.1% |
| 1.514000789 × 1018 | 1 | < 0.1% |
| Distinct | 60618 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1734.896307 |
| Minimum | 0 |
|---|---|
| Maximum | 2312359 |
| Zeros | 161113 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 155.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 15 |
| Q1 | 151 |
| median | 528 |
| Q3 | 1662 |
| 95-th percentile | 5005 |
| Maximum | 2312359 |
| Range | 2312359 |
| Interquartile range (IQR) | 1511 |
Descriptive statistics
| Standard deviation | 6355.42819 |
|---|---|
| Coefficient of variation (CV) | 3.66328994 |
| Kurtosis | 5612.044669 |
| Mean | 1734.896307 |
| Median Absolute Deviation (MAD) | 460 |
| Skewness | 43.0372489 |
| Sum | 3.53955852 × 1010 |
| Variance | 40391467.48 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 161113 | 0.8% |
| 5001 | 131737 | 0.6% |
| 1 | 129055 | 0.6% |
| 2 | 77887 | 0.4% |
| 3 | 73059 | 0.4% |
| 5000 | 73043 | 0.4% |
| 6 | 56754 | 0.3% |
| 5002 | 56454 | 0.3% |
| 5 | 54333 | 0.3% |
| 13 | 54183 | 0.3% |
| Other values (60608) | 19534515 |
| Value | Count | Frequency (%) |
| 0 | 161113 | |
| 1 | 129055 | |
| 2 | 77887 | |
| 3 | 73059 | |
| 4 | 53920 | 0.3% |
| 5 | 54333 | 0.3% |
| 6 | 56754 | 0.3% |
| 7 | 53230 | 0.3% |
| 8 | 46085 | 0.2% |
| 9 | 44533 | 0.2% |
| Value | Count | Frequency (%) |
| 2312359 | 1 | |
| 2312095 | 1 | |
| 1424263 | 1 | |
| 1424244 | 1 | |
| 1424235 | 1 | |
| 1424234 | 1 | |
| 1424227 | 1 | |
| 1424210 | 1 | |
| 1424199 | 1 | |
| 1424157 | 1 |
| Distinct | 214828 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12836.24223 |
| Minimum | 0 |
|---|---|
| Maximum | 52767632 |
| Zeros | 506476 |
| Zeros (%) | 2.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 155.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 61 |
| median | 321 |
| Q3 | 1361 |
| 95-th percentile | 10617 |
| Maximum | 52767632 |
| Range | 52767632 |
| Interquartile range (IQR) | 1300 |
Descriptive statistics
| Standard deviation | 280304.5629 |
|---|---|
| Coefficient of variation (CV) | 21.83696426 |
| Kurtosis | 2387.985147 |
| Mean | 12836.24223 |
| Median Absolute Deviation (MAD) | 307 |
| Skewness | 44.53262786 |
| Sum | 2.618867213 × 1011 |
| Variance | 7.857064799 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 506476 | 2.5% |
| 1 | 285616 | 1.4% |
| 2 | 228759 | 1.1% |
| 3 | 190880 | 0.9% |
| 4 | 167885 | 0.8% |
| 5 | 153599 | 0.8% |
| 6 | 141173 | 0.7% |
| 7 | 130745 | 0.6% |
| 8 | 119371 | 0.6% |
| 9 | 110652 | 0.5% |
| Other values (214818) | 18366977 |
| Value | Count | Frequency (%) |
| 0 | 506476 | |
| 1 | 285616 | |
| 2 | 228759 | |
| 3 | 190880 | 0.9% |
| 4 | 167885 | 0.8% |
| 5 | 153599 | 0.8% |
| 6 | 141173 | 0.7% |
| 7 | 130745 | 0.6% |
| 8 | 119371 | 0.6% |
| 9 | 110652 | 0.5% |
| Value | Count | Frequency (%) |
| 52767632 | 1 | |
| 52013313 | 1 | |
| 47189262 | 1 | |
| 31286187 | 1 | |
| 31284443 | 1 | |
| 26455115 | 1 | |
| 26377492 | 1 | |
| 26354172 | 1 | |
| 26348013 | 1 | |
| 24934027 | 1 |
| Distinct | 645207 |
|---|---|
| Distinct (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 54764.88325 |
| Minimum | 0 |
|---|---|
| Maximum | 47010526 |
| Zeros | 151 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 155.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 104 |
| Q1 | 2161 |
| median | 11809 |
| Q3 | 49365 |
| 95-th percentile | 247856 |
| Maximum | 47010526 |
| Range | 47010526 |
| Interquartile range (IQR) | 47204 |
Descriptive statistics
| Standard deviation | 142835.4966 |
|---|---|
| Coefficient of variation (CV) | 2.608158516 |
| Kurtosis | 12149.91797 |
| Mean | 54764.88325 |
| Median Absolute Deviation (MAD) | 11322 |
| Skewness | 44.25165229 |
| Sum | 1.117320432 × 1012 |
| Variance | 2.040197909 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 26440 | 0.1% |
| 2 | 24040 | 0.1% |
| 3 | 21993 | 0.1% |
| 4 | 20547 | 0.1% |
| 5 | 19569 | 0.1% |
| 6 | 19199 | 0.1% |
| 7 | 17891 | 0.1% |
| 8 | 17441 | 0.1% |
| 25 | 17207 | 0.1% |
| 9 | 16974 | 0.1% |
| Other values (645197) | 20200832 |
| Value | Count | Frequency (%) |
| 0 | 151 | < 0.1% |
| 1 | 26440 | |
| 2 | 24040 | |
| 3 | 21993 | |
| 4 | 20547 | |
| 5 | 19569 | |
| 6 | 19199 | |
| 7 | 17891 | |
| 8 | 17441 | |
| 9 | 16974 |
| Value | Count | Frequency (%) |
| 47010526 | 2 | |
| 47010202 | 2 | |
| 47001594 | 1 | |
| 47001553 | 1 | |
| 47000936 | 1 | |
| 47000906 | 1 | |
| 47000010 | 1 | |
| 46999969 | 1 | |
| 46999645 | 1 | |
| 46998997 | 1 |
| Distinct | 20314584 |
|---|---|
| Distinct (%) | 99.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.505113445 × 1018 |
| Minimum | 1.496738675 × 1018 |
|---|---|
| Maximum | 1.514030604 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 155.7 MiB |
Quantile statistics
| Minimum | 1.496738675 × 1018 |
|---|---|
| 5-th percentile | 1.497619265 × 1018 |
| Q1 | 1.50082366 × 1018 |
| median | 1.504844726 × 1018 |
| Q3 | 1.509408762 × 1018 |
| 95-th percentile | 1.513086567 × 1018 |
| Maximum | 1.514030604 × 1018 |
| Range | 1.729192897 × 1016 |
| Interquartile range (IQR) | 8.585102103 × 1015 |
Descriptive statistics
| Standard deviation | 4.939617604 × 1015 |
|---|---|
| Coefficient of variation (CV) | 0.003281890559 |
| Kurtosis | -1.176191804 |
| Mean | 1.505113445 × 1018 |
| Median Absolute Deviation (MAD) | 4.22642174 × 1015 |
| Skewness | 0.104753572 |
| Sum | 4.581629262 × 1018 |
| Variance | 2.439982208 × 1031 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1.497764193 × 1018 | 2 | < 0.1% |
| 1.497764237 × 1018 | 2 | < 0.1% |
| 1.497764222 × 1018 | 2 | < 0.1% |
| 1.497764222 × 1018 | 2 | < 0.1% |
| 1.497764222 × 1018 | 2 | < 0.1% |
| 1.497764222 × 1018 | 2 | < 0.1% |
| 1.497764222 × 1018 | 2 | < 0.1% |
| 1.497764222 × 1018 | 2 | < 0.1% |
| 1.497764223 × 1018 | 2 | < 0.1% |
| 1.497764223 × 1018 | 2 | < 0.1% |
| Other values (20314574) | 20402113 |
| Value | Count | Frequency (%) |
| 1.496738675 × 1018 | 1 | |
| 1.496738675 × 1018 | 1 | |
| 1.496738676 × 1018 | 1 | |
| 1.496738676 × 1018 | 1 | |
| 1.496738676 × 1018 | 1 | |
| 1.496738676 × 1018 | 1 | |
| 1.496738676 × 1018 | 1 | |
| 1.496738677 × 1018 | 1 | |
| 1.496738677 × 1018 | 1 | |
| 1.496738677 × 1018 | 1 |
| Value | Count | Frequency (%) |
| 1.514030604 × 1018 | 1 | |
| 1.514030603 × 1018 | 1 | |
| 1.514030603 × 1018 | 1 | |
| 1.514030603 × 1018 | 1 | |
| 1.514030602 × 1018 | 1 | |
| 1.514030602 × 1018 | 1 | |
| 1.5140306 × 1018 | 1 | |
| 1.5140306 × 1018 | 1 | |
| 1.514030599 × 1018 | 1 | |
| 1.514030598 × 1018 | 1 |
| Distinct | 3207253 |
|---|---|
| Distinct (%) | 15.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.4 GiB |
| 2022-03-28 03:12:22 | 417 |
|---|---|
| 2022-03-28 03:12:21 | 398 |
| 2022-03-28 03:12:24 | 394 |
| 2022-04-04 03:08:11 | 394 |
| 2022-04-04 02:35:07 | 393 |
| Other values (3207248) |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 288044 ? |
|---|---|
| Unique (%) | 1.4% |
Sample
| 1st row | 2022-04-01 00:00:00 |
|---|---|
| 2nd row | 2022-04-01 00:00:00 |
| 3rd row | 2022-04-01 00:00:00 |
| 4th row | 2022-04-01 00:00:00 |
| 5th row | 2022-04-01 00:00:00 |
Common Values
| Value | Count | Frequency (%) |
| 2022-03-28 03:12:22 | 417 | < 0.1% |
| 2022-03-28 03:12:21 | 398 | < 0.1% |
| 2022-03-28 03:12:24 | 394 | < 0.1% |
| 2022-04-04 03:08:11 | 394 | < 0.1% |
| 2022-04-04 02:35:07 | 393 | < 0.1% |
| 2022-03-28 03:17:45 | 391 | < 0.1% |
| 2022-04-04 02:35:05 | 388 | < 0.1% |
| 2022-04-04 03:20:13 | 387 | < 0.1% |
| 2022-04-04 03:20:16 | 387 | < 0.1% |
| 2022-03-28 03:17:44 | 384 | < 0.1% |
| Other values (3207243) | 20398200 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2022-03-07 | 567745 | 1.4% |
| 2022-03-06 | 566767 | 1.4% |
| 2022-03-05 | 546780 | 1.3% |
| 2022-03-08 | 519385 | 1.3% |
| 2022-03-09 | 493857 | 1.2% |
| 2022-03-15 | 484221 | 1.2% |
| 2022-03-21 | 480519 | 1.2% |
| 2022-03-04 | 480290 | 1.2% |
| 2022-03-17 | 468632 | 1.1% |
| 2022-03-18 | 468098 | 1.1% |
| Other values (86438) | 35727972 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 63601 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1198.569732 |
| Minimum | 0 |
|---|---|
| Maximum | 2952269 |
| Zeros | 4277424 |
| Zeros (%) | 21.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 155.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 29 |
| Q3 | 309 |
| 95-th percentile | 4684 |
| Maximum | 2952269 |
| Range | 2952269 |
| Interquartile range (IQR) | 308 |
Descriptive statistics
| Standard deviation | 6461.94484 |
|---|---|
| Coefficient of variation (CV) | 5.391379963 |
| Kurtosis | 2442.363763 |
| Mean | 1198.569732 |
| Median Absolute Deviation (MAD) | 29 |
| Skewness | 18.38342198 |
| Sum | 2.445337909 × 1010 |
| Variance | 41756731.11 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 4277424 | 21.0% |
| 1 | 1265031 | 6.2% |
| 2 | 693333 | 3.4% |
| 3 | 483402 | 2.4% |
| 4 | 373359 | 1.8% |
| 5 | 304281 | 1.5% |
| 6 | 258516 | 1.3% |
| 7 | 225900 | 1.1% |
| 8 | 201466 | 1.0% |
| 9 | 180061 | 0.9% |
| Other values (63591) | 12139360 |
| Value | Count | Frequency (%) |
| 0 | 4277424 | |
| 1 | 1265031 | 6.2% |
| 2 | 693333 | 3.4% |
| 3 | 483402 | 2.4% |
| 4 | 373359 | 1.8% |
| 5 | 304281 | 1.5% |
| 6 | 258516 | 1.3% |
| 7 | 225900 | 1.1% |
| 8 | 201466 | 1.0% |
| 9 | 180061 | 0.9% |
| Value | Count | Frequency (%) |
| 2952269 | 1 | < 0.1% |
| 436782 | 2 | < 0.1% |
| 436781 | 2 | < 0.1% |
| 436778 | 5 | |
| 436774 | 1 | < 0.1% |
| 436772 | 1 | < 0.1% |
| 436769 | 1 | < 0.1% |
| 436768 | 4 | |
| 436767 | 5 | |
| 436764 | 4 |
| Distinct | 66 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 GiB |
| en | |
|---|---|
| fr | 984332 |
| de | 926169 |
| it | 854292 |
| und | 832856 |
| Other values (61) |
Length
| Max length | 3 |
|---|---|
| Median length | 2 |
| Mean length | 2.040849307 |
| Min length | 2 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | en |
|---|---|
| 2nd row | en |
| 3rd row | en |
| 4th row | en |
| 5th row | en |
Common Values
| Value | Count | Frequency (%) |
| en | 13793384 | |
| fr | 984332 | 4.8% |
| de | 926169 | 4.5% |
| it | 854292 | 4.2% |
| und | 832856 | 4.1% |
| es | 738594 | 3.6% |
| th | 287577 | 1.4% |
| uk | 245309 | 1.2% |
| pl | 197337 | 1.0% |
| tr | 196654 | 1.0% |
| Other values (56) | 1345629 | 6.6% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| en | 13793384 | |
| fr | 984332 | 4.8% |
| de | 926169 | 4.5% |
| it | 854292 | 4.2% |
| und | 832856 | 4.1% |
| es | 738594 | 3.6% |
| th | 287577 | 1.4% |
| uk | 245309 | 1.2% |
| pl | 197337 | 1.0% |
| tr | 196654 | 1.0% |
| Other values (56) | 1345629 | 6.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Year Account Created
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 18 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2015.789509 |
| Minimum | 1970 |
|---|---|
| Maximum | 2022 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 155.7 MiB |
Quantile statistics
| Minimum | 1970 |
|---|---|
| 5-th percentile | 2009 |
| Q1 | 2012 |
| median | 2016 |
| Q3 | 2020 |
| 95-th percentile | 2022 |
| Maximum | 2022 |
| Range | 52 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 4.500545935 |
|---|---|
| Coefficient of variation (CV) | 0.002232646769 |
| Kurtosis | -1.381911647 |
| Mean | 2015.789509 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -0.1234495251 |
| Sum | 4.112640566 × 1010 |
| Variance | 20.25491371 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=18)
| Value | Count | Frequency (%) |
| 2021 | 2478516 | |
| 2022 | 2051629 | |
| 2020 | 1845686 | |
| 2009 | 1720640 | 8.4% |
| 2011 | 1573615 | 7.7% |
| 2012 | 1386073 | 6.8% |
| 2014 | 1232417 | 6.0% |
| 2010 | 1209564 | 5.9% |
| 2019 | 1195174 | 5.9% |
| 2013 | 1194260 | 5.9% |
| Other values (8) | 4514559 |
| Value | Count | Frequency (%) |
| 1970 | 14 | < 0.1% |
| 2006 | 3161 | < 0.1% |
| 2007 | 75373 | 0.4% |
| 2008 | 319643 | 1.6% |
| 2009 | 1720640 | |
| 2010 | 1209564 | |
| 2011 | 1573615 | |
| 2012 | 1386073 | |
| 2013 | 1194260 | |
| 2014 | 1232417 |
| Value | Count | Frequency (%) |
| 2022 | 2051629 | |
| 2021 | 2478516 | |
| 2020 | 1845686 | |
| 2019 | 1195174 | |
| 2018 | 987842 | 4.8% |
| 2017 | 1108220 | |
| 2016 | 1000913 | |
| 2015 | 1019393 | |
| 2014 | 1232417 | |
| 2013 | 1194260 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Unnamed: 0 | User ID | Following | Followers | Total Tweets | Tweet ID | Date of Tweet | Retweet Count | Language | Year Account Created | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 16882774 | 1158 | 392 | 88366 | 1509681950042198030 | 2022-04-01 00:00:00 | 3412 | en | 2008 |
| 1 | 1 | 3205296069 | 122 | 881 | 99853 | 1509681950151348229 | 2022-04-01 00:00:00 | 100 | en | 2015 |
| 2 | 2 | 1235940869812809728 | 231 | 72 | 5481 | 1509681950683926556 | 2022-04-01 00:00:00 | 9 | en | 2020 |
| 3 | 3 | 1347985375566966784 | 399 | 377 | 301 | 1509681951116046336 | 2022-04-01 00:00:00 | 573 | en | 2021 |
| 4 | 4 | 1505394816636846083 | 158 | 25 | 8982 | 1509681951304990720 | 2022-04-01 00:00:00 | 190 | en | 2022 |
| 5 | 5 | 799652508771766274 | 766 | 2024 | 4601 | 1509681952000937999 | 2022-04-01 00:00:00 | 1 | en | 2016 |
| 6 | 6 | 1280648773346066432 | 1343 | 549 | 67966 | 1509681952978210849 | 2022-04-01 00:00:00 | 5 | en | 2020 |
| 7 | 7 | 17673635 | 70 | 2033266 | 394746 | 1509681953053843466 | 2022-04-01 00:00:00 | 2 | en | 2008 |
| 8 | 8 | 46671396 | 2042 | 221952 | 93320 | 1509681953091457035 | 2022-04-01 00:00:00 | 3 | en | 2009 |
| 9 | 9 | 1275475606684172290 | 167 | 6102 | 6651 | 1509681953418711050 | 2022-04-01 00:00:00 | 0 | en | 2020 |
Last rows
| Unnamed: 0 | User ID | Following | Followers | Total Tweets | Tweet ID | Date of Tweet | Retweet Count | Language | Year Account Created | |
|---|---|---|---|---|---|---|---|---|---|---|
| 20402123 | 20402123 | 1502125154461233152 | 793 | 758 | 2409 | 1509677719549845506 | 2022-03-31 23:43:11 | 3 | es | 2022 |
| 20402124 | 20402124 | 787156255089360896 | 2436 | 1295 | 69526 | 1509677720388931589 | 2022-03-31 23:43:11 | 2845 | en | 2016 |
| 20402125 | 20402125 | 863000459576881152 | 5001 | 4101 | 418594 | 1509677720514433028 | 2022-03-31 23:43:11 | 16 | en | 2017 |
| 20402126 | 20402126 | 1478475898689134592 | 320 | 12 | 1849 | 1509677721659527168 | 2022-03-31 23:43:11 | 10 | und | 2022 |
| 20402127 | 20402127 | 1183429966697910272 | 414 | 150 | 6854 | 1509677722590605313 | 2022-03-31 23:43:12 | 3 | en | 2019 |
| 20402128 | 20402128 | 1502237057195945984 | 482 | 348 | 6583 | 1509677723136020480 | 2022-03-31 23:43:12 | 53 | en | 2022 |
| 20402129 | 20402129 | 823139604115001344 | 416 | 845 | 31521 | 1509677724490641410 | 2022-03-31 23:43:12 | 35 | en | 2017 |
| 20402130 | 20402130 | 1502028100967624704 | 39 | 41 | 3234 | 1509677724880703495 | 2022-03-31 23:43:12 | 67 | ar | 2022 |
| 20402131 | 20402131 | 1126857882308304896 | 309 | 199 | 3456 | 1509677724914569220 | 2022-03-31 23:43:12 | 0 | ja | 2019 |
| 20402132 | 20402132 | 1465798903152922625 | 865 | 452 | 2595 | 1509677727288373255 | 2022-03-31 23:43:13 | 7 | en | 2021 |